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Abstract 

Technological advances have dramatically expanded our ability to probe multi-neuronal dynamics and 
connectivity in the brain. However, our ability to extract a simple conceptual understanding from complex 
data is increasingly hampered by the lack of theoretically principled data analytic procedures, as well as the¬ 
oretical frameworks for how circuit connectivity and dynamics can conspire to generate emergent behavioral 
and cognitive functions. We review and outline potential avenues for progress, including new theories of high 
dimensional data analysis, the need to analyze complex artificial networks, and methods for analyzing entire 
spaces of circuit models, rather than one model at a time. Such interplay between experiments, data analysis 
and theory will be indispensable in catalyzing conceptual advances in the age of large-scale neuroscience. 


“ Things should be as simple as possible, but not simpler.'’’’ 
-Albert Einstein. 


Introduction 

Experimental neuroscience is entering a golden age marked by the advent of remarkable new methods 
enabling us to record ever increasing numbers of neurons m can eiei, and measure brain connectivity at 
various levels of resolution [a is in nni nn na na nu , sometimes measuring both connectivity and dynamics 
in the same set of neurons mm- This recent thrust of technology development is spurred by the hope 
that an understanding of how the brain gives rise to sensations, actions and thoughts will lurk within the 
resulting brave new world of complex large-scale data sets. However, the question of how one can extract a 
conceptual understanding from data remains a significant challenge for our field. Major issues involve: (1) 
What does it even mean to conceptually understand “how the brain works?” (2) Are we collecting the right 
kinds and amounts of data to derive such understanding? (3) Even if we could collect any kind of detailed 
measurements about neural structure and function, what theoretical and data analytic procedures would we 
use to extract conceptual understanding from such measurements? These are profound questions to which 
we do not have crisp, detailed answers. Here we merely present potential routes towards the beginnings of 
progress on these fronts. 


Understanding as a journey from complexity to simplicity 

First, the vague question of “how the brain works” can be meaningfully reduced to the more precise, and 
proximally answerable question of how do the connectivity and dynamics of distributed neural circuits give 
rise to specific behaviors and computations? But what would a satisfactory answer to this question look like? 
A detailed, predictive circuit model down to the level of ion-channels and synaptic vesicles within individual 
neurons, while remarkable, may not yield conceptual understanding in any meaningful human sense. For 
example, if simulating this detailed circuit were the only way we could predict behavior, then we would be 
loath to say that we understand how behavior emerges from the brain. 


Email addresses: prgao@stanford.edu (Peiran Gao), sganguli@stanford.edu (Surya Ganguli) 


Preprint submitted to Curr. Op. in Neurobiology 


March 31, 2015 





Instead, a good benchmark for understanding can be drawn from the physical sciences. Feynman artic¬ 
ulated the idea that we understand a physical theory if we can say something about the solutions to the 
underlying equations of the theory without actually solving those equations. For example, we understand 
aspects of fluid mechanics because we can say many things about specific fluid flows, without having to 
numerically solve the Navier-Stokes equations in every single case. Similarly, in neuroscience, understanding 
will be found when we have the ability to develop simple coarse-grained models, or better yet a hierarchy 
of models, at varying levels of biophysical detail, all capable of predicting salient aspects of behavior at 
varying levels of resolution. In traversing this hierarchy, we will obtain an invaluable understanding of which 
biophysical details matter, and more importantly, which don’t, for any given behavior. Thus our goal should 
be to find simplicity amidst complexity, while of course keeping in mind Einstein’s famous dictum quoted 
above. 


How many neurons are enough: simplicity and complexity in multineuronal dynamics 

What kinds and amounts of data are required to arrive at simple but accurate coarse grained models? In 
the world of large scale recordings, where we do not have access to simultaneous connectivity information, 
the focus has been on obtaining a state-space description of the dynamics of neural circuits through various 
dimensionality reduction methods (see m for a review). This body of work raises a key conceptual issue 
permeating much of systems neuroscience, namely, what precisely can we infer about neural circuit dynamics 
and its relation to cognition and behavior while measuring only an infinitesimal fraction of behaviorally 
relevant neurons? For example, given a doubling time of about 7.4 years |18{ in the number of neurons we 
can simultaneously measure at single cell, single spike-time resolution, we would have to wait more than 100 
years before we can observe O(10 6 — 10 9 ) neurons typically present in full mammalian circuits controlling 
complex behaviors m- Thus, systems neuroscience will remain for the foreseeable future within the vastly 
undersampled measurement regime, so we need a theory of neuronal data analysis in this regime. Such theory 
is essential for (1) guiding the biological interpretation of complex multivariate data analytic techniques, (2) 
efficiently designing future large scale recording experiments, and (3) developing theoretically principled data 
analysis algorithms appropriate for the degree of subsampling. 

A clue to the beginnings of this theory lies in an almost universal result occurring across many experiments 
in which neuroscientists tightly control behavior, record many trials, and obtain trial averaged neuronal 
firing rate data from hundreds of neurons: in such experiments, the dimensionality (i.e. number of principal 
components required to explain a fixed percentage of variance) of neural data turns out to be much less 
than the number of recorded neurons (Fig. [I]). Moreover, when dimensionality reduction procedures are 
used to extract neuronal state dynamics, the resulting low dimensional neural trajectories yield a remarkably 
insightful dynamical portrait of circuit computation (e.g. 122112U [221). 

These results raise several profound and timely questions: what is the origin of the underlying simplicity 
implied by the low dimensionality of neuronal recordings? How can we trust the dynamical portraits that we 
extract from so few neurons? Would the dimensionality increase if we recorded more neurons? Would the 
portraits change? Without an adequate theory, it is impossible to quantitatively answer, or even precisely 
formulate, these important questions. We have recently started to develop such a theory mm- Central 
to this theory is the mathematically well-defined notion of neuronal task complexity (NTC). Intuitively, the 
NTC measures the volume of the manifold of task parameters (see Fig. for the special cases of simple 
reaches) measured in units of the neuronal population autocorrelation scale across each task parameter. 
Thus the NTC in essence measures how many neuronal activity patterns could possibly appear during the 
course of an experiment given that task parameters have a limited extent and neuronal activity patterns 
vary smoothly across task parameters (Fig. [2]B). With the mathematical definition of the NTC in hand, we 
derive that (1) the dimensionality of neuronal data is upper bounded by the NTC, and (2) if the neural data 
manifold is sufficiently randomly oriented, we can accurately recover dynamical portraits when the number 
of observed neurons is proportional to the log of the NTC (Fig. [2p). 

These theorems have significant implications for the interpretation and design of large-scale experiments. 
First, it is likely that in a wide variety of experiments, the origin of low dimensionality is due to a small NTC, 
a hypothesis that we have verified in recordings from the motor and premotor cortices of monkeys performing 
a simple 8 direction reach task |43j . In any such scenario, simply increasing the number of recorded neurons, 
without a concomitant increase in task complexity will not lead to richer, higher dimensional datasets - indeed 
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Figure 1 : In many experiments (e.g. in insect |2311201 [ 24 l 1251126 | olfactory systems, mammalian olfactory | 271 I 26 | , prefrontal 
1281129112111301122 | . motor and premotor. | 3 II [ 32 ] . somatosensory | 33 | . visual | 341135 | . hippocampal | 36 | . and brain stem E3 
systems) a much smaller number of dimensions than the number of recorded neurons captures a large amount of variance in 
neural firing rates. 


data dimensionality will be independent of the number of recorded neurons. Moreover, we confirmed in motor 
cortical data our theoretically predicted result that the number of recorded neurons should be proportional 
to the logarithm of the NTC to accurately recover dynamical portraits of neural state trajectories. This 
is excellent news: while we must make tasks more complex to obtain richer, more insightful datasets, we 
need not record from many more neurons within a brain region to accurately recover its internal state-space 
dynamics. 


Towards a theory of single trial data analysis 

The above work suggests that the proximal route for progress lies not in recording more neurons alone, but 
in designing more complex tasks and stimuli. However, with such increased complexity, the same behavioral 
state or stimulus may rarely be revisited, precluding the possibility of trial averaging as a method for data 
analysis. Therefore it is essential to extend our theory to the case of single trial analysis. A simple formulation 
of the problem is as follows: suppose we have a K dimensional manifold of behavioral states (or stimuli), 
where K is not necessarily known, and the animal explores P states in succession. The behavior is controlled 
by a circuit with N neurons but we measure only M of them. Furthermore, each neuron is noisy with a 
finite SNR, reflecting single trial variability. For what values of M, N , P, K , and the SNR can we accurately 
(1) estimate the dimensionality K of neural data, and (2) accurately decode behavior on single trials? We 
have solved this problem analytically in the case in which noisy neural activity patterns reflecting P discrete 
stimuli lie near a K dimensional subspace (Fig. §M3). We find, roughly that the relations M, P > K and 
SNRV MP > K are sufficient. Thus, it is an intrinsic measure of neural complexity K , and not the total 
number of neurons N , that sets a lower bound on how many neurons M and stimuli P we must observe 
at a given SNR for accurate single trial analyses. Moreover, we have generalized this analysis to learning 
dynamical systems (Fig. (3pD). 

Both our preliminary analyses reveal the existence of phase transitions in performance as a function of (1) 
the number of recorded neurons, and (2) the amount of recording time, stimuli, or behavioral states. Only 
on the correct side of the phase boundary are accurate dimensionality, dynamics estimation and single trial 
decoding possible. Such phase transitions are often found in many high dimensional data analysis problems 
135] . for example in compressed sensing [SSI El] and matrix recovery |52j . They reveal the beautiful fact 
that we can recover a lot of information about large objects (vectors or matrices) using surprisingly few 
measurements when we have seemingly weak prior information, like sparsity, or low-rank structure (see 
[52102] for reviews in a neuroscience context). Moreover, in Fig. [3j we see that we can move along the phase 
boundary by trading off number of recorded neurons with recording time. 
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Figure 2: (A) For a monkey reaching to different directions, the trial averaged behavioral states visited by the arm throughout 
the experiment are parameterized by a cylinder with two coordinates, reach angle 6, and time into the reach t. (B) Trial 
averaged neural data is an embedding of the task manifold into firing rate space. The number of dimensions explored by the 
neural data manifold is limited by its volume and its curvature (but not the total number of neurons in the motor cortex), 
with smoother embeddings exploring fewer dimensions. The NTC is a mathematically precise upper bound on the number 
of dimensions of the neural data manifold given the volume of the task parameter manifold and a smoothness constraint on 
the embedding. (C) If the neural data manifold is low dimensional and randomly oriented w.r.t. single neuron axes, then its 
shadow onto a subset of recorded neurons will preserve its geometric structure. We have shown, using random projection theory 
|3811391[40] that to preserve neural data manifold geometries with fractional error e, one needs to record M > ^■A'log(NTC) 
neurons. The figure illustrates a K = 1 dimensional neural manifold in N = 3 neurons, and we only record M = 2 neurons. 
Thus, fortunately, the intrinsic complexity of the neural data manifold (small), not the number of neurons in the circuit (large) 
determines how many neurons we need to record. 


Thus, to guide future large-scale experimental design, it will be exceedingly important to determine the 
position of these phase boundaries under increasingly realistic biological assumptions, for example exploring 
the roles of spiking variability, noise correlations, sparsity, cell types, and network connectivity constraints, 
and how they impact our ability to uncover true network dynamics and single trial decodes in the face of 
subsampling. In essence, we need to develop a Rosetta stone connecting biophysical network dynamics to 
statistics. This dictionary will teach us when and how the learned parameters of statistical models fit to a 
subset of recorded neurons ultimately encode the collective dynamics of the much larger, unobserved neural 
circuit containing them - an absolutely fundamental question in neuroscience. 


Understanding complex networks with complete information 

As we increasingly obtain information about both the connectivity and dynamics of neural circuits, we 
have to ask ourselves how should we use this information? As a way to sharpen our ideas, it can be useful 
to engage in a thought experiment in which experimental neuroscience eventually achieves complete success, 
in enabling us to measure detailed connectivity, dynamics and plasticity in full neural sub-circuits during 
behavior. How then would we extract understanding from such rich data? Moreover, could we arrive at 
this same understanding without collecting all the data, perhaps even only collecting data within reach in 
the near future? To address this thought experiment, it is useful to turn to advances in computer science, 
where deep or recurrent neural networks, consisting of multiple layers of cascaded nonlinearities, have made 
a resurgence as the method of choice for solving a range of difficult computational problems. Indeed, deep 
learning (see m Ha EZ] for reviews) has led to advances in object detection :58] [59], face recognition 
[60l f6l] , speech recognition [62], language translation [63], genomics [64] , microscopy [65], and even modeling 
biological neural responses [BB] [571 15511591 . 

Each of these networks can solve a complex computational problem. Moreover, we know the full network 
connectivity, the dynamics of every single neuron, the plasticity rule used to train the network, and indeed 
the entire developmental experience of the network, in terms of its exposure to training stimuli. Virtually 
any experiment we wish to do on these networks, we can do. Yet a meaningful understanding of how 
these networks work still eludes us, as well has what a suitable benchmark for such understanding would be. 
Following Feynman’s guideline for understanding physical theories, can we say something about the behavior 
of deep or recurrent artificial neural networks without actually simulating them in detail? More importantly, 
could we arriving at these statements of understanding without measuring every detail of the network, 
and what are the minimal set of measurements we would need? We do not believe that understanding these 
networks will directly inform us about how much more complex biological neural networks operate. However, 
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Figure 3: (A) and (B) The inferred dimensionality and held-out single trial decoding error as a function of P simulated training 
examples (single trials) and M recorded neurons in a situation where stimuli are encoded in a K = 20 dimensional subspace 
in a network of N = 5000 neurons, with SNR=5. Inference was performed using low rank matrix denoising m, and our new 

analysis of this algorithm reveals a sufficient condition for accurate inference, snr - Vk ) 2 {yj n Jj K - > K - 

The black curve in (A) and (B) reflects the theoretically predicted phase boundary in the P, M plane separating accurate from 
inaccurate inference. This expression simplifies in the experimentally relevant regime k, m < n and k « m, p to snr VWp > k. 
(C) and (D) Learning the dimensionality and dynamics, via subspace identification m of a linear neural network of size 
N = 5000 from spontaneous noise driven activity. The low-rank connectivity of the network forces the system to lie in a K = 6 
dimensional subspace. Performance is measured as a function of the number of recorded neurons M and recording time T. 
By combining and extending time series random matrix theory I46| , low-rank perturbation theory ST], and noncommutative 
probability theory, m, we have derived a theoretically predicted phase boundary (black curve in (C) and(D)), that matches 
simulations. In (D), left, the subspace overlap is the correlation between the inferred subspace and the true subspace, with 1 
being perfect correlation, or overlap. In (D), on the right, dynamics (eigenvalues) are correctly predicted only on the right side 
of the boundary (red dots are true eigenvalues, blue crosses are estimated eigenvalues). 

even in an artificial setting, directly confronting the question of what it means to understand how complex 
distributed circuits compute, and what kinds of experiments and data analytic procedures we would need to 
arrive at this understanding, could have a dramatic impact on the questions we ask, experiments we design, 
and the data analysis we do, in the pursuit of this same understanding in biological neural circuits. 

The theory of deep learning is still in its infancy, but some examples of general statements one can make 
about deep circuits without actually simulating them include how their functional complexity scales with 
depth [ZQl ITT], how their synaptic weights, over time, acquire statistical structures in inputs [721173], and 
how their learning dynamics is dominated by saddle points, not local minima [Z!|74]. However, much more 
work at the intersection of experimental and theoretical neuroscience and machine learning will be required 
before we can address the intriguing thought experiment of what we should do if we could measure anything 
we wanted. 


Understanding not a single model, but the space of all possible models 

An even higher level of understanding is achieved when we develop not just a single model that explains 
a data set, but rather understand the space of all possible models consistent with the data. Such an 
understanding can place existing biological systems within their evolutionary context, leading to insights 
about why they are structured the way they are, and can reveal general principles that transcend any 
particular model. Inspiring examples for neuroscientists can be found not only within neuroscience, but 
also in allied fields. For example m derived a single boolean network model of the yeast cell-cycle control 
network, while El developed methods to count and sample from the space of all networks that realize the 
yeast cell-cycle. This revealed an astronomical number of possible networks consistent with the data, but 
only 3% of these networks were more robust than the one chosen by nature, revealing potential evolutionary 
pressure towards robustness. In protein folding, theoretical work m analyzed, in toy models, the space of all 
possible amino acid sequences that give rise to a given fold; the number of such sequences is the designability 
of the fold. Theory revealed that typical folds with shapes similar to those occurring in nature are highly 
designable, and therefore more easily found by evolution. Moreover, designable folds are thermodynamically 
stable ESI and atypical in shape El, revealing general principles relating sequence to structure. In the 
realm of short-term sequence memory, the idea of liquid state machines Ham] posited that generic neural 
circuits could convert temporal information into instantaneous spatial patterns of activity, but theoretical 
work [821 1831 184| revealed general principles relating circuit connectivity to memory, highlighting the role 
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of non-normal and orthogonal network connectivities in achieving robust sequence memory. In the realm of 
long-term memory, seminal work revealed that it is essential to treat synapses as entire dynamical systems in 
their own right, exhibiting a particular synaptic model [85], while further theoretical work [86] analyzed the 
space of all possible synaptic dynamical systems, revealing general principles relating synaptic structure to 
function. Furthermore conductance based models of central pattern generators revealed that highly disparate 
conductance levels can yield similar behavior suggesting that observed correlations in conductance levels 
across animals [58] could reflect a signature of homeostatic design [SjJ) . 

These examples all show that studying the space of models consistent with a given data set or behavior can 
greatly expand our conceptual understanding. Further work along these lines within the context of neuronal 
networks are likely to yield important insights. For example, suppose we could understand the space of 
all possible deep or recurrent neural networks that solve a given computational task. Which observable 
aspects of the connectivity and dynamics are universal across this space, and which are highly variable 
across individual networks? Are the former observables the ones we should focus on measuring in real 
biological circuits solving the same task? Are the latter observables less relevant and more indicative of 
historical accidents over the time course of learning? 

In summary, there are great challenges and opportunities for generating advances in high dimensional 
data analysis and neuronal circuit theory that can aid in not only responding to the need to interpret existing 
complex data, but also in driving the questions we ask, and the design of large-scale experiments we do to 
answer these questions. Such advances in theory and data analysis will be required to transport us from 
the “brave new world, that has such [complex technology] in’t” [50] and deliver us to the promised land of 
conceptual understanding. 
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